Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction

نویسندگان

  • Dan He
  • David Kuhn
  • Laxmi Parida
چکیده

UNLABELLED Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. AVAILABILITY AND IMPLEMENTATION The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. CONTACT [email protected].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Leveraging input and output structures for joint mapping of epistatic and marginal eQTLs

MOTIVATION As many complex disease and expression phenotypes are the outcome of intricate perturbation of molecular networks underlying gene regulation resulted from interdependent genome variations, association mapping of causal QTLs or expression quantitative trait loci must consider both additive and epistatic effects of multiple candidate genotypes. This problem poses a significant challeng...

متن کامل

Machine scheduling for multitask machining

Multitasking is an important part of today’s manufacturing plants. Multitask machine tools are capable of processing multiple operations at the same time by applying a different set of part and tool holding devices. Mill-turns are multitask machines with the ability to perform a variety of operations with considerable accuracy and agility. One critical factor in simultaneous machining is to cre...

متن کامل

An Intelligent Algorithm based Controller for Multiple Output DC-DC Converters with Voltage Mode Weighting Factor

Multiple output DC-DC converters are widely used in many applications such as aerospace, industrial and medical equipments. The purpose of this paper is to present an intelligent control system for the multiple output DC-DC converters. In order to perform this purpose, a double ended forward DC-DC converter with three output voltages (+5 V/ 50W, +15 V/ 45W and -15 V/ 15W) is considered and anal...

متن کامل

Multiple Fuzzy Regression Model for Fuzzy Input-Output Data

A novel approach to the problem of regression modeling for fuzzy input-output data is introduced.In order to estimate the parameters of the model, a distance on the space of interval-valued quantities is employed.By minimizing the sum of squared errors, a class of regression models is derived based on the interval-valued data obtained from the $alpha$-level sets of fuzzy input-output data.Then,...

متن کامل

QSAR studies and application of genetic algorithm - multiple linear regressions in prediction of novel p2x7 receptor antagonists’ activity

Quantitative structure-activity relationship (QSAR) models were employed for prediction the activity of P2X7 receptor antagonists. A data set consisted of 50 purine derivatives was utilized in the model construction where 40 and 10 of these compounds were in the training and test sets respectively. A suitable group of calculated molecular descriptors was selected by employing stepwise multiple ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 32  شماره 

صفحات  -

تاریخ انتشار 2016